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Abstract. The wavelet tree is a versatile data structure that serves 
a number of purposes, from string processing to geometry. It can be 
regarded as a device that represents a sequence, a reordering, or a grid 
of points. In addition, its space adapts to various entropy measures of the 
data it encodes, enabling compressed representations. New competitive 
solutions to a number of problems, based on wavelet trees, are appearing 
every year. In this survey we give an overview of wavelet trees and the 
surprising number of applications in which we have found them useful: 
basic and weighted point grids, sets of rectangles, strings, permutations, 
binary relations, graphs, inverted indexes, document retrieval indexes, 
full-text indexes, XML indexes, and general numeric sequences. 

1 Introduction 

The wavelet tree was invented in 2003 by Grossi, Gupta, and Vitter [54] , as a data 
structure to represent a sequence and answer some queries on it. Curiously, a 
data structure that has turned out to have a myriad of applications was buried in 
a paper full of other eye-catching results. The first mention to the name "wavelet 
tree" appears on page 8 of 10 [54, Sec. 4.2]. The last mention is also on page 8, 
save for a figure caption on page 9. Yet, the wavelet tree was a key tool to obtain 
the main result of the paper, a milestone in compressed full-text indexing. 

It is interesting that, after some thought, one can see that the wavelet tree 
is a slight generalization of an old (1988) data structure by Chazelle [25], heav- 
ily used in Computational Geometry. This data structure represents a set of 
points on a two-dimensional grid: it describes a successive reshuffling process 
where the points start sorted by one coordinate and end up sorted by the other. 
Karkkalnen, in 1999 [66], was the first to put this structure in use in the com- 
pletely different context of text indexing. Still, the concept and usage were totally 
different from the one Grossi et al. would propose four years later. 

We have already mentioned three ways in which wavelet trees can be re- 
garded: (i) as a representation of a sequence; (ii) as a representation of a re- 
ordering of elements; (in) as a representation of a grid of points. Since 2003, these 
views of wavelet trees, and their interactions, have been fruitful in a surprisingly 
wide range of problems, extending well beyond the areas of text indexing and 
computational geometry where the structure was conceived. 
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Fig. 1. A wavelet tree on string S = "alabar a la alabarda". We draw the spaces as 
underscores. The subsequences of S and the subsets of S labeling the edges are drawn 
for illustration purposes; the tree stores only the topology and the bitmaps. 

Our goal in this article is to give an overview of this marvellous data structure 
and its many applications. We aim to introduce, to an audience with a general 
algorithmic background, the basic data organization used by wavelet trees, the 
information they can model, and the wide range of problems they can solve. 
We will also mention the most technical results and give the references to be 
followed by the more knowledgeable readers, advising the rest what to skip. 

Being ourselves big fans of wavelet trees, and having squeezed them out for 
several years, it is inevitable that there will be many references to our own work 
in this survey. We apologize in advance for this, as well as for oversights of others' 
results, which are likely to occur despite our efforts. 

2 Data Structure 

Let £[1,71] = sis 2 ...s„ be a sequence of symbols Sj € S, where £ = [l..cr] is 
called the alphabet. Then S can be represented in plain form using n|~lgcr] = 
n lg a + O (n) bits (we use lg x = log 2 x) . 

Structure. A wavelet tree [54] for sequence 5[l,n] over alphabet [1..ct] can be 
described recursively, over a sub-alphabet range [a..b] C [l..er]. A wavelet tree 
over alphabet [a..b] is a binary balanced tree with b — a + 1 leaves. If a = b, 
the tree is just a leaf labeled a. Else it has an internal root node, v root , that 
represents 5[l,n]. This root stores a bitmap B Vrgot [1, n] denned as follows: if 
S[i] < (a + b)/2 then B Vroot [i] = 0, else B Vroot [i] = 1. We define S Q [l,n 0 } as the 
subsequence of n] formed by the symbols c < (a + b)/2, and S\ [1, ni] as the 
subsequence of S[l, n] formed by the symbols c > (a + b)/2. Then, the left child 
of v roo t is a wavelet tree for Sollj^o] over alphabet [a..[(a + 6)/2j] and the right 
child of v roo t is a wavelet tree for 5*i[l, ni] over alphabet [1 + [(a + b)/2\ ..&]. 

Fig. 1 displays a wavelet tree for the sequence S = "alabar a la alabarda". 
Here for legibility we are using S — { ' ' , a, b, d, 1, r}, so n = 19 and a = 6. 

Note that this wavelet tree has height [lg a] , and it has a leaves and a — 1 
internal nodes. If we regard it level by level, it is not hard to see that it stores 



exactly n bits at each level, and at most n bits in the last one. Thus, nflga] 
is an upper bound to the total number of bits it stores. Storing the topology of 
the tree requires 0(a lgn) further bits, if we are careful enough to use O(lgn) 
bits for the pointers. This extra space may be a problem on large alphabets. We 
show in the paragraph "Removing redundancy" how to save it. 

Tracking symbols. This wavelet tree represents S, in the sense that one can 
recover S from it. More than that, it is a succinct data structure for S, in the 
sense that it takes space asymptotically equal to a plain representation of S 1 , and 
it permits accessing any S[i] in time O(lgcr), as follows. 

To extract S[i], we first examine B Vroot [i]. If it is a 0, we know that S[i] < (cr+ 
l)/2, otherwise S[i] > (<r+l)/2. In the first case, we must continue recursively on 
the left child; in the second case, on the right child. The problem is to determine 
where has position i been mapped to on the left (or right) child. In the case of 
the left child, where B Vroot [i] — 0, i has been mapped to position iq, which is the 
number of 0s in B Vroot up to position i. For the right child, where B Vroot [i] = 1, 
this corresponds to position i\, the number of Is in B Vroot up to position i. The 
number of 0s (resp. Is) up to position i in a bitmap B is called ranko(-B, i) (resp. 
rank^B, i)). We continue this process recursively until we arrive at a leaf. The 
label of this leaf is S[i}. Note that we do not store the leaf labels; those are 
deduced as we successively restrict the subrange [a..b] of [l..cr] as we descend. 

Operation rank was already considered by Chazelle [25], who gave a simple 
data structure using 0(n) bits for a bitmap _B[l,n], that computed rank in 
constant time (note that we only have to solve rank^l^i), since rank 0 (i3,«) = 
i — ranki(_B, i)). Jacobson [63] improved the space to n + 0(nlglgn/lgn) = 
n + o(n) bits, and Golynski [48, 49] proved this space is optimal as long as we 
maintain B in plain form and build extra data structures on it. The solution 
is, essentially, storing rank answers every s = lg 2 n bits of B (using lg n bits 
per sample), then storing rank answers relative to the last sample every (lgn)/2 
bits (using lgs = 21glgn bits per sub-sample), and using a universal table to 
complete the answer to a rank query within a sub-sample. We will use in this 
survey the notation rankt,(_B, i, j) = rankfc(i?,j) — rankt,(_B,i — 1). 

Above, we have tracked a position from the root to a leaf, and as a conse- 
quence we have discovered the symbol represented at the root position. It is also 
useful to carry out the inverse process: given a position at a leaf, we can track 
it upwards and find out where it is on the root bitmap. This is done as follows. 

Assume we start at a given leaf, at position i. If the leaf is the left child of 
its parent v, then the position i' corresponding to i at v is the i-th occurrence 
of a 0 in its bitmap B v . If the leaf is the right child of its parent v, then i' 
is the position of the i-th occurrence of a 1 in B„. This procedure is repeated 
from v until we reach the root, where we find the final position. The operation 
of finding the i-th 0 (resp. 1) in a bitmap B[l,n} is called select 0 (B,i) (resp. 
select! (B, i)), and it can also be solved in constant time using the n bits of B 
plus o(n) bits [27, 79]. Thus the time to track a position upwards is also 0(lg o). 

The constant-time solution for select [27, 79] is analogous to that of rank. 
The bitmap is cut into blocks with s Is. Those that are long enough to store 



all their answers within sublinear space are handled in this way. The others are 
not too long (i.e., 0(\g 0 ^ n)) and thus encoding positions inside them require 
fewer bits (i.e., O(lglgn)). This permits repeating the idea recursively a second 
time. The third time, the remaining blocks are so short that can be handled in 
constant time using universal tables. Golynski [48,49] reduced the o(n) extra 
space to 0{n lg lg nj lg n) and proved this is optimal if B is stored in plain form. 

With the support for rank and select, the space required by the basic binary 
balanced wavelet tree reaches nflgcr] +o(n) \ga + 0{a\gn) bits. This completes 
a basic description of wavelet trees; the rest of the section is more technical. 

Reducing redundancy. As mentioned, the O(trlgn) term can be removed if 
necessary [72,74]. We slightly alter the balanced wavelet tree shape, so that 
all the leaves are grouped to the left (for this sake we divide the interval [a..b] 
of [L.cr] into [a..a + 2Ps(«'-»+i)l-i - 1] and [a + 2^ ( - b - a + 1 ^- 1 ..b]). Then, all 
the bitmaps at all the levels belong to consecutive nodes, and they can all be 
concatenated into a large bitmap B[l, n[lg<r]]. We know the bitmap of level 
£ starts at position 1 + nil — 1). Moreover, if we have determined that the 
bitmap of a wavelet tree node corresponds to B[l,r], then the bitmap of its 
left child is at B[n + l,n + I + ra.riko(B,l,r) — 1], and that of the right child 
is at B[n + I + rank 0 (£?, I, r), n + r]. Moving to the parent of a node is more 
complicated, but upward traversals can always be handled by first going down 
from the root to the desired leaf, so as to discover all the ranges in B of the 
nodes in the path, and then doing the upward processing as one returns from 
the recursion. 

Using just one bitmap, we do not need pointers for the topology, and the 
overall space becomes n|Tgcr] + o(n)lgcr bits. The time complexities do not 
change (albeit in practice the operations are slowed down a bit due to the extra 
rank operations needed to navigate [28]). 

The redundancy can be further reduced by representing the bitmaps using 
a structure by Golynski et al. [50], which uses n + (9(nlglgn/lg 2 n) bits and 
supports constant-time rank and select (this representation does not leave the 
bitmap in plain form, and thus it can break the lower bound [49]). Added over 
all the wavelet tree bitmaps, the space becomes n lg o + 0(n lg a lg lg n/ lg 2 n) = 
n lg a + o(n) bits. 1 This structure has not been implemented as far as we know. 

Speeding up traversals. Increasing the arity of wavelet trees reduces their 
height, which dictates the complexity of the downward and upward traversals. 

1 We assume lg a = 0(lg n) here; otherwise there are many symbols that do not appear 
in S. If this turns out to be the case, one should use a mapping from E to the range 
[l..cr'], where a 1 < n is the number of symbols actually appearing in S. Such a 
mapping takes constant time and a' \g(a/a') + o(a') + 0(lglg a) bits of space using 
the "indexable dictionaries" of Raman et al. [93]. Added to the n lg a' + o(n) bits of 
the wavelet tree, we are within nig it + o(n) + O(lglgo-) bits. This is nig a + o(n) 
unless n = O(lglga), in which case a plain representation of S using n[lgu] bits 
solves all the operations in 0(lglg a) time. To simplify, a recent analysis [45] claims 
nlgcr + 0(n) bits under similar assumptions. We will ignore the issue from now, and 
assume for simplicity that all symbols in [l..cr] do appear in S. 



If the wavelet tree is d-ary, then its height is \\g d a~\ . However, the wavelet tree 
does not store bitmaps anymore, but rather sequences B v over alphabet [l..d\, 
so that the symbol at S v [i] is stored at the child numbered B v [i] of node v. 

In order to obtain time complexities 0(1 + lg d a) for the operations, we need 
to handle rank and select on sequences over alphabet [l..d], in constant time. 
Ferragina et al. [40] showed that this is indeed possible, while maintaining the 
overall space within n lg a + o(n) lg a, for d — o(\gn/ lg lg n) . Using, for example, 
d = lg 1_c n for any constant 0 < e < 1, the overall space is n lg a+0(n lg a/ lg e n) 
bits. Golynski et al. [50] reduced the space to n lg a + o(n) bits. 

To support symbol rank and select on a sequence R[l,n] over alphabet 
[1-d], we assume we have d bitmaps B c [l,n], for c e where B c [i] = 1 iff 

R[i] — c. Then rank c (i?, i) and select c (i?, i) are reduced to ranki(_B c ,i) and 
selecti(B c , i). We cannot afford to store those B c , but we can store their extra 
o(n) data for binary rank and select. Each time we need access to B c , we access 
instead R and use a universal table to simulate the bitmap's content. Such table 
gives constant-time access to chunks of length lg d (n)/2 instead of lg(n)/2, so 
the overall space using Golynski et al.'s bitmap index representation [48,49] is 
0(dn lglgn/lg d n), which added over the lg d a levels of the wavelet tree gives 
0(n lgcr-dlgdlglgn/ lgn). This is o(n lg«r) for any d = lg 1_e n. Further reducing 
the redundancy to o(n) bits requires more sophisticated techniques [50]. 

Thus, the O(lga) upward/downward traversal times become 0(lga/lglgn) 
with multiary wavelet trees. Although theoretically attractive, it is not easy to 
translate their advantages to practice (see, e.g., a recent work studying inter- 
esting practical alternatives [17]). An exception, for a particular application, is 
described in the paragraph "Positional inverted indexes" of Section 5) . 

The upward traversal can be speeded up further, using techniques known in 
computational geometry [25] . Imagine we are at a leaf u representing a sequence 
S[l, n u ] and want to directly track position i to an ancestor v at distance t, which 
represents sequence S'[l,n w ]. We can store at the leaf u a bitmap B u [l,n v ], so 
that the n u positions corresponding to leaf u are marked as Is in B u . This bitmap 
is sparse, so it is stored in compressed form as an "indexable dictionary" [93], 
which uses n u lg(n„/n u )+o(n u ) + 0(lglgn„) bits and can answer selecti(S u , i) 
queries in O(l) time. Thus we track position i upwards for t levels in 0(1) time. 

The space required for all the bitmaps that point to node v is the sum, over 
at most 2* leaves u, of those n u \g{n v /n u ) + o(n u ) + 0(lglgn„) bits. This is 
maximized when n u — n v /2 t for all those u, where the space becomes t ■ n v + 
o(n v ) + 0(2 t lglgn„). Added over all the wavelet tree nodes with height multiple 
oft, we get nlgcr + o(nlgcr)-|-0(crlglgrj) = n\ga + o(n\ga). This is in addition 
to those n lg a + o{n) bits already used by the wavelet tree. 

If we want to track only from the leaves to the root, we may just use t = lg a 
and do the tracking in constant time. In many cases, however, one wishes to 
track from arbitrary to arbitrary nodes. In this case we can use 1/e values of 
t = lg" a, for i e [l..l/e — 1], so as to carry out (9(lg c a) upward steps with one 
value of t before reaching the next one. This gives a total complexity for upward 
traversals of 0((l/e) lg £ a) using 0((l/e)n\ga) bits of space. 



Construction. It is easy to build a wavelet tree in O(nlgcr) time, by a linear- 
time processing at each node. It is less obvious how to do it in little extra space, 
which may be important for succinct data structures. Two recent results [31, 96] 
offer various relevant space-time tradeoffs, building the wavelet tree within the 
time given, or close, and asymptotically negligible extra space. 

3 Compression 

The wavelet tree adapts elegantly to the compressibility of the data in many 
ways. Two key techniques to achieve this are using specific encodings on bitmaps, 
and altering the tree shape. This whole section is technical, yet nonexpert readers 
may find inspiring the beginning of the paragraph "Entropy coding", and the 
paragraph "Changing shape". 

Entropy coding. Consider again Fig. 1. The fact that the 'a' is much more 
frequent than the other symbols translates into unbalanced 0/1 frequencies in 
various bitmaps. Dissimilarities in symbol frequencies are an important source 
of compressibility. The amount of compression that can be reached is measured 
by the so-called empirical zero-order entropy of a sequence S[l, n]: 

Ho(S) = 5Z(n c /n)Ig(n/n c ) < lga 

where n c is the number of occurrences of c in S and the sum considers only the 
symbols that do appear in S. Then nH 0 (S) is the least number of bits into which 
S can be compressed by always encoding the same symbol in the same way. 2 

Grossi et al. [54] already showed that, if the bitmaps of the wavelet tree 
are compressed to their zero-order entropy, then their overall space is nHo(S). 
Let B Vroot contain n 0 0s and m Is. Then zero-order compressing it yields space 
n n \g(n/n 0 ) + n\\g{n/ni). Now consider its left child vi. Its bitmap, B VI , is of 
length no, and say it contains noo 0s and noi Is. Similarly, the right child is of 
length ni and contains n 10 0s and n n Is. Adding up the zero-order compressed 
space of both children yields n 00 \g(n 0 /n oa ) + n 0 i lg(n 0 /n 0 i) + n w lg(ni/ni 0 ) + 
tin lg(ni/rin). Now adding the space of the root bitmap yields noolg(n/noo) + 
noi lg(n/n 0 i) + nio lg(n/nio) + n n lg(n/nn). This would already be nH 0 (S) if 
a = 4. It is easy to see that, by splitting the spaces of the internal nodes until 
reaching the wavelet tree leaves, we arrive at X^cei; Uc ^s( n / n c) = tiHq{S). 

This enables using any zero-order entropy coding for the bitmaps that sup- 
ports constant-time rank and select. One is the "fully-indexable dictionary" of 
Raman et al. [93], which for a bitmap B[l, n] requires nHo(B) + 0(n\g lgn/ lg n) 
bits. A theoretically better one is that of Golynski et al. [50], which we have al- 
ready mentioned without yet telling that it actually compresses the bitmap, to 
nH 0 (B) + (9(nlglgn/lg 2 n). Patrascu [91] showed this can be squeezed up to 

2 In classical information theory [32], Ho is the least number of bits per symbol achiev- 
able by any compressor on an infinite source that emits symbols independently and 
randomly with probabilities n c /n. 



nH 0 (B) + O(n/ lg c n), answering rank and select in time O(c), for any constant 
c, and that this is essentially optimal [92]. 

Using the second or third encoding, the wavelet tree represents S within 
nHo(S) + o(n) bits, still supporting the traversals in time O(lgcr). Ferragina 
et al. [40] showed that the zero-order compression can be extended to multiary 
wavelet trees, reaching nHo(S) + o(n\ga) bits and time 0(1 + lgc/lglgn) for 
the operations, and Golynski et al. [50] reduced the space to nHo(S) + o(n) bits. 
Recently, Bclazzougui and Navarro [12] showed that the times can be reduced to 
0(1 + lgcr/lgw), where w = J?(lgn) is the size of the machine word. Basically 
they replace the universal tables with bit-parallel operations. Their space grows 
to nH 0 (S) + o(n(H 0 (S) + 1)). (They also prove and match the lower bound time 
complexity (9(1 + lg(lgcr/lgw)) using techniques that are beyond wavelet trees 
and this survey, but that do build on wavelet trees [7,4].) 

It should not be hard to see at this point that the sums of n u \g(n v /n u ) 
spaces used for fast upward traversals in Section 2 also add up to (l/e)nH 0 (S). 

Changing shape. The algorithms for traversing the wavelet tree work indepen- 
dently of its balanced shape. Furthermore, our previous analysis of the entropy 
coding of the bitmap also shows that the resulting space, at least with respect to 
the entropy part, is independent of the shape of the tree. This was already noted 
by Grossi et al. [55], who proposed using the shape to optimize average query 
time: If we know the relative frequencies f c with which each leaf c is sought, 
we can create a wavelet tree with the shape of the Huffman tree [62] of those 
frequencies, thus reducing the average access time to J2 c es fc^sO-ffc) < ^S a - 

Makinen and Navarro [70, Sec. 3.3], instead, proposed giving the wavelet 
tree the Huffman shape of the frequencies with which the symbols appear in S. 
This has interesting consequences. First, it is easy to see that the total number 
of bits stored in the wavelet tree is exactly the number of bits output by a 
Huffman compressor that takes the symbol frequencies in S, which is upper 
bounded by n(Ho(S) + 1). Therefore, even using plain bitmap representations 
taking n + o(n) bits of space, the total space becomes at most n(H 0 (S) + 1) + 
o(n(Ho(S) + 1)) + O(oTgn), that is, we compress not only the data, but also 
the redundancy space. This may seem irrelevant compared to the nHo(S) + o(n) 
bits that can be obtained using Golynski et al. [50] over a balanced wavelet 
tree. However, it is unclear whether that approach is practical; only that of 
Raman et al. [93] has successful implementations [89,28,84], and this one leads 
to total space nH 0 (S) + o(n\ga). Furthermore, plain bitmap representations are 
significantly faster than compressed ones, and thus compressing the wavelet tree 
by giving it a Huffman shape leads to a much faster implementation in practice. 

Another consequence of using Huffman shape, implied by Grossi et al. [55], 
is that if the accesses to the leaves are done with frequency proportional to 
their number of occurrences in S (which occurs, for example, if we access at 
random positions in 5), then the average access time is 0(1 + H 0 (S)) 7 better 
than the O(lgcr) of balanced wavelet trees. A problem is that the worst case 
could be as bad as 0(\gn) if a very infrequent symbol is sought [70]. However, 
one can balance wavelet subtrees after some depth, so that the average depth is 



0(1 + H 0 (Sj), the maximum depth is O(lgcr), and the total number of bits is at 
most n{H 0 (S) + 2) [70]. 

Recently, Barbay and Navarro [10] showed that Huffman shapes can be com- 
bined with multiary wavelet trees and entropy compression of the bitmaps, to 
achieve space nH 0 (S)+o(n) bits, worst-case time 0(l+lga/ lglgn), and average 
case time 0(1 + H 0 (S)/ lg lg n) . 

An interesting extension of Huffman shaped wavelet trees that has not been 
emphasized much is to use them a mechanism to give direct access on any 
variable-length prefix-free coding. Let S = S\, S2, ■ ■ ■ , s n be a sequence of sym- 
bols, which are encoded in some way into a bit-stream C = c(s\)c(s2) ■ ■ ■ c(s n ). 
For example, S may be a numeric sequence and c can be a <5-code, to favor 
small numbers [13], or c can be a Huffman or another prefix-free encoding. Any 
prefix-free encoding ensures that we can retrieve S from C, but if we want to 
maintain the compressed form C and access arbitrary positions of S, we need 
tricks like sampling S at regular intervals and store pointers to C . 

Instead, a wavelet tree representation of S, where for each Sj we rather encode 
c(si), uses the same number of bits of C and gives direct access to any S[i] in 
time 0(\c(si)\). More precisely, at the bitmap root position B Vraot [i] we write a 
0 if c(si) starts with a 0, and 1 otherwise. In the first case we continue by the 
left child and in the second case we continue by the right child, from the second 
bit of c(si), until the code is exhausted. Gagie et al. [43] combined this idea with 
multiary wavelet trees to obtain a faster decoding. 

Very recently, Grossi and Ottaviano [56] also took advantage of specific 
shapes, to give the wavelet tree the form of a trie of a set of strings. The goal 
was to handle a sequence of strings and extend operations like access and rank 
to such strings. The idea extends a previous, more limited, approach [72,74]. 

High-order entropy coding. High-order compression extends zero-order com- 
pression by encoding each symbol according to a context of length k that pre- 
cedes or follows it. The k-th order empirical entropy of S [77] is defined as 
H k(S) = Y,Aes k (\ S A\/ n ) Ho(Sa) < Hk-^S), where S A is the string of sym- 
bols preceding context A in 5*. Any statistical compressor assigning fixed codes 
that depend on a context of length k outputs at least nHk(S) bits to encode S. 

The Burrows- Wheeler transform [22] is a useful tool to achieve high-order 
entropy. It is a reversible transformation that permutes the symbols of a string 
S[l, n] as follows. First sort all the suffixes S[i, n] lexicographically, and then list 
the symbols that precede each suffix (where S[n] precedes 5[l,n]). The result, 
S [l,n], is the concatenation of the strings Sa for all the contexts A. By 
definition, if we compress each substring Sa of S to its zero-order entropy, 
the total space is the k-th order entropy of S, for k = \A\. 

The first [54] and second [39] reported use of wavelet trees used a similar par- 
titioning to represent each range of S bwt with a zero-order compressed wavelet 
tree, so as to reach nHk{S) + o(nlgcr) bits of space, for any k < alg a n and 
any constant 0 < a < 1. In the second case [39], the use of S bwt was explicit. 
The partitioning was not with a fixed context length, but instead an optimal 
partitioning was used [36]. This way, they obtained the given space simultane- 



ously for any k in the range. In the first case [54], they made no reference to the 
Burrows- Wheeler transform, but also compressed the sequences Sa of the fc-th 
order entropy formula, for a fixed fc. We give more details on the reasons behind 
the use of S bwt in Section 5. 

Already in 2004, Grossi et al. [55] realized that the careful partitioning into 
many small wavelet trees, one per context, was not really necessary to achieve 
fc-th order compression. By using a proper encoding on its bitmaps, a wavelet 
tree on the whole S hwt could reach fc-th order entropy compression of a string S. 
They obtained 2nHk(S) bits, plus redundancy, by using 7-codes [13] on the runs 
of 0s and Is in the wavelet tree bitmaps. Makinen and Navarro [73] observed the 
same fact when encoding the bitmaps using Raman et al. [93] fully indexable 
dictionaries. They reached nHk(S) + o(n\ga) bits of space, simultaneously for 
any fc < alg CT n and any constant 0 < a < 1, using just one wavelet tree for the 
whole string. This yielded simpler and faster indexes in practice [28]. 

The key property is that some entropy-compression methods are local, that 
is, their space is the sum of the zero-order entropies of short substrings of S bwt . 
This can be shown to be upper-bounded by the entropy of the whole string, but 
also by the sum of the entropies of the substrings Sa- Even more surprisingly, 
Karkkalnen and Puglisi [67] recently showed that the fc-th order entropy is still 
reached if one cuts S bwt into equally-spaced regions of appropriate length, and 
thus simplified these indexes further by using the faster and more practical 
Huffman-shaped wavelet trees on each region. 

There are also more recent and systematic studies [35, 59] of the compress- 
ibility properties of wavelet trees, and how they relate to gap and run-length 
encodings of the bitmaps, as well to the balancing and the arity. 

Exploiting repetitions. Another relevant source of compressibility is repeti- 
tiveness, that is, that 5[l,n] can be decomposed into a few substrings that have 
appeared earlier in S, or alternatively, that there is a small context-free grammar 
that generates S. Many compressors build on these principles [13], but support- 
ing wavelet tree functionality on such compressed representations is harder. 

Makinen and Navarro [71] studied the effect of repetitions in the Burrows- 
Wheeler transform of S. They showed that S bwt could be partitioned into at most 
nHk(S)+a k runs of equal letters in S bwt , for any fc. It is not hard to see that those 
runs are inherited by the wavelet tree bitmaps, where run-length compression 
would take proper advantage of them. Makinen and Navarro followed a different 
path: they built a wavelet tree on the run heads and used a couple of bitmaps 
to simulate the operations on the original strings. The compressibility of those 
two bitmaps has been further studied by Makinen et al. [95, 75] in the context 
of highly repetitive sequence collections, and also by Simon Gog [47, Sec. 3.6.1]. 

In some cases, however, we need the wavelet tree of the very same string S 
that contains the repetition, not its Burrows- Wheeler transform. We describe 
such an application in the paragraph "Document retrieval indexes" of Section 6. 

Recently, Navarro et al. [86] proposed a grammar-compressed wavelet tree 
for this problem. The key point is that repetitions in S[l, n] induce repetitions 
in B Vroot [1, n]. They used Re-Pair [69], a grammar-based compressor, on the 



bitmaps, and enhanced a Re-Pair-based compressed sequence representation [53] 
to support binary rank (they only needed downward traversals). This time, the 
wavelet tree partitioning into left and right children cuts each repetition into 
two, so quickly after a few levels such regularities are destroyed and another 
type of bitmap compression (or none) is preferred. While the theoretical space 
analysis is too weak to be useful, the result is good in practice and leaves open 
the challenge of achieving stronger theoretical and practical results. 

We will find even more specific wavelet tree compression problems later. 

4 Sequences, Reorderings, or Point Grids? 

Now that we have established the basic structure, operations, and encodings of 
wavelet trees, let us take a view with more perspective. Various applications we 
have mentioned display different ways to regard a wavelet tree representation. 
As a sequence of values. This is the most basic one. The wavelet tree on a 
sequence S = s\, . . . , s n represents the values Si. The most important operations 
that the wavelet tree must offer to support this view are, apart from accessing 
any S[i] (that we already explained in Section 2), rank and select on S. For 
example, the second main usage of wavelet trees [39,40] used access and rank 
on the wavelet tree built on sequence S bwt in order to support searches on S. 

The process to support rank c (5, i) is similar to that for access, with a subtle 
difference. We start at position i in B Vroot , and decide whether to go left or 
right depending on where is the leaf corresponding to c (and not depending on 
B Vroot [i\). If we go left, we rewrite i rank 0 (B Vroot , i) , else we rewrite i <— 
ranki (B Vroot , i) . When we arrive at the leaf c, the value of i is the final answer. 
The time complexity for this operation is that of a downward traversal towards 
the leaf labeled c. To support select c (S', i) we just apply the upward tracking, 
as described in Section 2, starting at the i-th position of the leaf labeled c. 

As a reordering. Less obviously, the wavelet tree structure describes a stable 
ordering of the symbols in S, so that if one traverses the leaves one finds first 
all the occurrences of the smaller symbols, and within the same symbol (i.e., the 
same leaf), they are ordered by original position. As it will be clear in Section 5, 
one can argue that this is the usage of wavelet trees made by their creators [54] . 

In this case, tracking a position downwards in the wavelet tree tells where 
it goes after sorting, and tracking a position upwards tells where each symbol 
is placed in the sequence. An obvious application is to encode a permutation tt 
over [l..n]. Our best wavelet tree takes nlgn + o(n) bits and can compute any 
ir{i) and 7r — 1 (i) in time 0(lgn/lglgn) by carrying out, respectively, downward 
and upward tracking of position i. We will see improvements on this idea later. 

As a grid of points. The slightly less general structure of Chazellc [25] can 
be taken as the representation of a set of points supported by wavelet trees. It is 
generally assumed that we have annxn grid with n points so that no two points 
share the same row or column (i.e., a permutation). A general set of n points is 
mapped to such a discrete grid by storing the real coordinates somewhere else 
and breaking ties somehow (arbitrarily is fine in most cases). 



Take the set of points (xi, yi), in ^-coordinate order (i.e., Xi < x i+ i). Now de- 
fine string S[l, n] = yi , y 2 , ■ ■ ■ , y n - Then we can find the i-th point in x-coordinate 
order by accessing S[i]. Moreover, since the wavelet tree is representing the re- 
ordering of the points according to y-coordinate, one can find the i-th point in 
y-coordinate order by tracking upwards the i-th point in the leaves. 

Unlike permutations, here the emphasis is in counting and reporting the 
points that lie within a rectangle [x min ,x ma x\ x [y mm ,y m ax]- This is solved 
through a more complicated tracking mechanism, well-known in computational 
geometry and also described explicitly on wavelet trees [72]. We start at the 
root bitmap range B Vroot [xi,x r ], where xi — x m i n and x r = x max . Now we map 
the interval to the left and to the right, using xi rank 0 /i(-Bt, root ,xj — 1) + 1 
and x r <— rank 0 /i(i? l , root ,x r ), and continue recursively. At any node along the 
recursion, we may stop if (i) the interval [xi,x r ] becomes empty (thus there 
are no points to report); (ii) the interval of leaves (i.e., y-coordinate values) 
represented by the node has no intersection with [y m i ni y ma x]', (Hi) the interval 
of leaves is contained in [y m in,y m ax]- In case (Hi) we can count the number of 
points falling in this sub-rectangle as x r — xi + l. As it is well known that we visit 
only O(lgn) wavelet tree nodes before stopping all the recursive calls (see, e.g., 
a recent detailed proof, among other more sophisticated wavelet tree properties 
[45]), the counting time is O(lgn). Each of the x r — xi + 1 points found in each 
node can be tracked up and down to find their x- and y-coordinates, in O(lgn) 
time per reported occurrence. There are more efficient variants of this technique 
that we will cover in Section 7, but they build on this basic idea. 

5 Applications as Sequences 

Full-text indexes. A full-text index built a string 5*[l,n] is able to count and 
locate the occurrences of arbitrary patterns P[l, m] in S. A classical index is the 
suffix array [52,76], A[l,n], which lists the starting positions of all the suffixes 
of S, S[A[i],n], in lexicographic order, using n[lgn] bits. The starting positions 
of the occurrences of P in S appear in a contiguous range in A, which can be 
binary searched in time O(ralgn), or 0(m + lgn) by doubling the space. A suffix 
tree [98, 78, 1] is a more space-consuming structure (yet still O(nlgn) bits) that 
can find the range in time 0(m). After finding the range, each occurrence is 
reported in constant time, both in suffix trees and arrays. 

The suffix array of 5* is closely related to its Burrows- Wheeler transform: 
S bwt \i] = S[A[i] - 1] (taking S[0] = S[n]). Fcrragina and Manzini [37, 38] showed 
how, using at most 2m access and rank operations on S hwt , one could count 
the number of occurrences in S of a pattern P[l,m]. Using multiary wavelet 
trees [40, 50] this gives a counting time of 0(m) on polylog-sized alphabets, and 
0(mlgcr/lglgn) in general. Each such occurrence can then be located in time 
0(lg 1+e nlg<r/lglgn) for any e > 0, at the price of 0(n/lg c n) = o(n) further 
bits of space. This result has been superseded very recently [7, 12, 11, 4], in some 
cases using wavelet trees as a part of the solution, and in all cases with some 
extra price in terms of redundancy, such as o(nHk(Sj) and 0(n) further bits. 



Grossi et al. [57, 58, 54] used wavelet trees to obtain a similar result via a 
quite different strategy. They represented A by means of a permutation ^(i) = 
A -1 [A[i] + 1], that is, the cell in A pointing to A[i] + 1. <F turns out to be formed 
by a contiguous ascending runs. The suffix array search can be simulated in 
O(ralgn) accesses to <?". They encode & separately for the range of each context 
Sa (recall paragraph "High-order entropy coding" in Section 3). As all the & 
pointers coming from each run are increasing, a wavelet tree is used to describe 
how the a ascending sequences of pointers coming from each run are intermingled 
in the range of Sa- This turns out to be, precisely, the wavelet tree of Sa- This 
is why both Ferragina et al. and Grossi et al. obtain basically the same space, 
nHk(S) + o(n\ga) bits. Due to the different search strategy, the counting time 
of Grossi et al. is higher. On the other hand, the representation of & allows them 
to locate patterns in sublogarithmic time, still using 0(nH k (S)) + o(n\ga) bits. 

This is the best known usage of wavelet trees as sequences, and it is well 
covered in earlier surveys [82]. New extensions of these basic concepts, supporting 
more sophisticated search problems, appear every year (e.g., [94, 14]). We cover 
next other completely different applications. 

Positional inverted indexes. Consider a natural language text collection. A 
positional inverted index is a data structure that stores, for each word, the list 
of the positions where it appears in the collection [3] . In compressed form [99] it 
takes space close to the zero-order entropy of the text seen as a sequence of words 
[82] . This entropy yields very competitive compression in natural language texts. 
Yet, we need to store both the text (usually zero-order compressed, so that direct 
access is possible) and the inverted index, adding up to at least 2nHo(S), where 
S is the text regarded as a sequence of word identifiers. Inverted indexes are by 
far the most popular data structures to index natural language text collections, 
so reducing their space requirements is of high relevance. 

By representing the sequence of word identifiers using a wavelet tree, we 
obtain a single representation for both the text and the inverted index, all within 
nH 0 (S) + o(n) bits [28]. In order to access any text word, we just compute S[i]. 
In order to access the i-th element of the inverted list of any word c, we compute 
select^S 1 , i). Furthermore, operation rank c (5, i) is useful to implement some 
list intersection algorithms [8], as it finds the position i in the inverted list of 
word c more efficiently than with a binary or exponential search. 

Arroyuelo et al. [2] extended this functionality to document retrieval: retrieve 
the distinct documents where a word appears. They use a special symbol "$" 
to mark document boundaries. Then, given the first occurrence of a word c, 
p = select^S, 1), the document where this occurrence lies is j = rank$(S l , p) + l, 
document j ends at position p' = select$(5', j), it contains o = rank c (S',p,p') 
occurrences of the word c, and the search for further relevant documents can 
continue from query select c (S', o + 1). 

An improvement over the basic idea is to use multiary wavelet trees, more 
precisely of arity up to 256, and using the property that wavelet trees give direct 
access to any variable-length code. Brisaboa et al. [19] started with a byte- 
oriented encoding of the text words (using either Huffman with 256 target sym- 



bols, or other practical encoding methods [20]) and then organized the sequence 
of codes into a wavelet tree, as described in the paragraph "Changing shape" of 
Section 3. A naive byte-based rank and select implementation on the wavelet 
tree levels gives good results in this application, with the bytes represented in 
plain form. The resulting structure is indeed competitive with positional inverted 
indexes in many cases. A variant specialized on XML text collections, where the 
codes are also used to distinguish structural elements (tags, content, attributes, 
etc.) in order to support some XPath queries, is also being developed [18]. 

Graphs. Another simple application of this idea is the representation of directed 
graphs [28]. Let G be a graph with n nodes and e edges. An adjacency list, 
using nlge + elgn bits (the n pointers to the lists plus the e target nodes) 
gives direct access to the neighbors of any node v. If we want also to perform 
reverse nagivation, that is, to know which nodes point to v, we must spend other 
nlge + elgn bits to represent the transposed graph. 

Once again, representing with a wavelet tree the sequence S*[l,e] concate- 
nating all the adjacency lists, plus a compressed bitmap B[\,e] marking the 
beginnings of the lists, gives access to both types of neighbors within space 
nlg(e/n) + elgn + 0(n) + o(e), which is close to the space of the plain rep- 
resentation (actually, possibly less). To retrieve the i-th neighbor of a node v, 
we compute the starting point of the list of v, I 4— selecti(S, v), and then 
access S[l To retrieve the i-th reverse neighbor of a node v, we compute 

p <— select v (S,i) to hnd the i-th time that v is mentioned in an adjacency 
list, and then compute with ranki(B,p) the owner of the list where v is men- 
tioned. Both operations take time 0(lgn/ lglgn). This is also useful to represent 
undirected graphs, where adjacency lists must usually represent each edge twice. 
With a wavelet tree we can choose any direction for an edge, and at query time 
we join direct and reverse neighbors of nodes to build their list. 

Note, finally, that the wavelet tree can compress S to its zero-order entropy, 
which corresponds to the distribution of in-degrees of the nodes. A more sophis- 
ticated variant of this idea, combined with Re-Pair compression [69] , was shown 
to be competitive with current Web graph compression methods [29] . 

6 Applications as Reorderings 

Apart from its first usage [54], that can be regarded as encoding a reordering, 
wavelet trees offer various interesting applications when seen in this way. 
Permutations. As explained in Section 4, one can easily encode a permutation 
with a wavelet tree. It is more interesting that the encoding can take less space 
when the permutation is, in a sense, compressible. Barbay and Navarro [9, 10] 
considered permutations ir of [l..n] that can be decomposed into p contiguous 
ascending runs, of lengths r 1; r 2 , . . . , r p . They define the entropy of such a per- 
mutation as H(tt) — J2i=i( r i/ n )\g( n / r i)i an d show that it is possible to sort 
an array with such ascending runs in time 0(n(H(jr) + 1)). This is obtained by 
building a Huffman tree on the run lengths (seen as frequencies) and running a 
mergesort-like algorithm that follows the Huffman tree shape. 



They note that, if we encode with 0 or 1 the results of the comparisons of the 
mcrgesort algorithm at each node of the merging tree, the resulting structure 
contains at most n(H(n) + 1) bits, and it represents the permutation. Starting 
at position i in the top bitmap B Vrgot one can track down the position exactly as 
done with wavelet trees, so as to arrive at position j of the i-th leaf (i.e., run). By 
storing, in O(plgn) bits, the starting position of each run in ir, we can convert 
the leaf position into a position in n. Therefore the downward traversal solves 
operation 7r _1 (i), because it starts from value i (i.e., position i after sorting ir), 
and gives the position in tt from where it started before the merging took place. 
The corresponding upward traversal, consequently, solves n(i). Other types of 
runs, more and less general, are also studied [9, 10]. 

Some thought reveals that this structure is indeed the wavelet tree of a se- 
quence formed by replacing, in 7r _1 , each symbol belonging to the i-th run, by 
the run identifier i. Then the fact that a downward traversal yields 7r — 1 (z) and 
that the upward traversal yields ir(i) are natural consequences. This relation is 
made more explicit in a later article [7, 4] . 

Generic numeric sequences. There are several basic problems on sequences 
of numbers that can be solved in nontrivial ways using wavelet trees. We mention 
a few that have received attention in the literature. 

One such problem is the range quantile query: Preprocess a sequence of num- 
bers 5[l,n] on the domain [1..ct] so that later, given a range [l,r] and a value i, 
we can compute the i-th smallest element in S[l,r]. 

Classical solutions to this problem have used nearly quadratic space and 
constant time. Only a very recent solution [65] reaches 0(n lgn) bits of space 
(apart from storing S) and 0(lgn/lglgn) time. We show that, by representing 
S with a wavelet tree, we can solve the problem in O(lger) time and just o(n) 
extra bits [46, 45] . This is close to 0(lg nj lg lg n) (in this problem, we can always 
make a < n hold), and it can be even better if a is small compared to n. 

Starting from the range 5*[^,r], we compute ra.iik.o(B Vroot ,l,r). If this is i or 
more, then the i-th value in this range is stored in the left subtree, so we go to 
the left child and remap the interval [I, r] as done for counting points in a range 
(see Section 4). Otherwise we go right, subtracting ranko (-B„ root , I, r) from i and 
remapping [l,r] in the same way. When we arrive at a leaf, its label is the i-th 
smallest element in S[l,r]. 

Another fundamental problem is called range next value: Preprocess a se- 
quence of numbers S[l, n] on the domain [l..<r] so that later, given a range [I, r] 
and a value x, we return the smallest value in S[l,r] that is larger than x. 

The state of the art also includes superlinear-space and constant-time solu- 
tions, as well as one using O(nlgn) bits of space and 0(lgn/lglgn) time [100]. 
Once again, we achieve o(n) extra bits and O(lgcr) time using wavelet trees [45] 
(we improve this time in the paragraph "Binary relations" of Section 7). 

Starting at the root from the range 5[Z,r], we see if value x labels a leaf 
descending from the left or from the right child. If x descends from the right 
child, then no value on the left child can be useful, so we recursively descend 
to the right child and remap the interval [l,r] as done for counting points in a 



range. Else, there may be values > x on both children, but we prefer those on 
the left, if any. So we first descend to the left child looking for an answer (there 
may be no answer if, at some node, the interval [I, r] becomes empty). If the left 
child returns an answer, this is what we seek and we return it. If, however, there 
is no value > x on the left child, we seek the smallest value on the right child. 
We then enter into another mode where we see if there is any 0-bit in B v [l,r]. 
If there is one, we go to the left child, else we go to the right child. It can be 
shown that the overall process takes O(lger) time. 

A variant of the range next value problem is called prevLess [68] : return the 
rightmost value in S[l,r] that is smaller than x. Here we start with S[l,r]. If 
value x labels a leaf descending from the left, we map the interval to the left 
child and continue recursively from there. If, instead, x descends from the right 
child, then the answer may be on the left or the right child, and we prefer the 
rightmost in [1, r]. Any 0-bit in B v [l, r] is a value smaller than x and thus a valid 
answer. We use rank and select to find the rightmost 0 in B„[l,r]. We also 
continue recursively by the right child, and if it returns an answer, we map it to 
the bitmap B v [l, r}. Then we choose the rightmost between the answer from the 
right child and the rightmost zero. The overall time is O(lger). 

Non-positional inverted indexes. These indexes store only the list of distinct 
documents where each word appears, and come in two flavors [99, 3]. In the first, 
the documents for each word are sorted by increasing identifier. This is useful 
to implement list unions and intersections for boolean, phrase and proximity 
queries. In the second, a "weight" (measuring importance somehow) is assigned 
to each document where a word appears. The lists of each word store those 
weights and are sorted by decreasing weight. This is useful to implement ranked 
bag-of-word queries, which give the documents with highest weights added over 
all the query words. It would seem that, unless one stores two inverted indexes, 
one must choose one order in detriment of the queries of the other type. 

By representing a reordering, wavelet trees can store both orderings simul- 
taneously [85,45]. Let us represent the documents where each word appears in 
decreasing weight order, and concatenate all the lists into a sequence 5[l,n]. A 
bitmap B[l, n] marks the starting positions of the lists, and the weights are stored 
separately. Then, a wavelet tree representation of S simulates, within the space 
of just one list, both orderings. By accessing S[l+i — 1], where I = select! (B, c), 
we obtain the i-th element of the inverted list of word c, in decreasing weight 
order. To access the i-th element of the inverted list of a word in increasing 
document order, we also compute the end of its list, r = selecti(S, c + 1) — 1, 
and then run a range quantile query for the i-th smallest value in the range 
[l,r]. Many other operations of interest in information retrieval can be carried 
out with this representation and little auxiliary data [85,45]. 

Document retrieval indexes. An interesting extension to full-text retrieval 
is document retrieval, where a collection 5[l,n] of general strings (so inverted 
indexes cannot be used) is to be indexed to answer different document retrieval 
queries. The most basic one, document listing, is to output the distinct docu- 
ments where a pattern P[l,m] appears. Muthukrishnan [80] defined a so-called 



document array D[l,n], where D[i] gives the document to which the i-th lexico- 
graphically smallest suffix of S belongs (i.e., where the suffix 5[A[i],n] belongs, 
where A is the suffix array of S). He also defined an array C[l,n], where C[i] 
points to the previous occurrence of D[i] in D. A suffix tree was used to identify 
the range A[l, r] of the pattern occurrences, so that we seek to report the distinct 
elements in D[l,r]. With further structures to find minima in ranges of C [15], 
Muthukrishnan gave an 0(m + occ) algorithm to find the occ distinct documents 
where P appears. This is time-optimal, yet the space is impractical. 

This is another case where wavelet trees proved extremely useful. Makinen 
and Vahmaki [97] showed that, if one implemented I? as a wavelet tree, then 
array C was not necessary, since C[i] = select^] (.D, rank D ^(D, i — 1)). They 
also used a compressed full-text index [39] to identify the range D[l,r], so the to- 
tal time turned out to be 0(m lg a+occ lg d), where d is the number of documents 
in S. Moreover, for each document c output, rank c (D, I, r) gave the number of 
times P appeared in c, which is important for ranked document retrieval. 

Gagie et al. [46, 45] showed that an application of range quantile queries en- 
abled the wavelet tree to solve this problem elegantly and without any range min- 
ima structure: The first distinct document is the smallest value in D[l, r] . If it oc- 
curs /i times, then the second distinct document is the (l + /i)-th smallest value 
in D[l,r], and so on. They retained the complexities of Makinen and Valimaki, 
but the solution used less space and time in practice. Later [45] they replaced 
the range quantile queries by a depth-first traversal of the wavelet tree that 
reduced the time complexity, after the suffix array search, to 0{occ\g{d/ occ)). 
The technique is similar to the two-dimensional range searches: recursively enter 
into every wavelet tree branch where the mapped interval [/, r] is not empty, and 
report the leaves found, with frequency r — I + 1. 

This depth-first search method can easily be extended to support more com- 
plex queries, for example i-thresholded ones: given s patterns, we want the doc- 
uments where at least t of the terms appear. We can first identify the s ranges 
in D and then traverse the wavelet tree while maintaining the s ranges, stopping 
when less than t intervals are nonempty, or when we arrive at leaves (where 
we report the document). Other sophisticated traversals have been proposed for 
retrieving the documents ranked by number of occurrences of the patterns [33] . 

An interesting problem is how to compress the wavelet tree of D effectively. 
The zero-order entropy of D has to do with document lengths, which is generally 
uninteresting, and unrelated to the compressiblity of S. It has been shown [44, 86] 
that the compressibility of S shows up as repetitions in D, which has stimulated 
the development of wavelet tree compression methods that take advantage of 
the repetitiveness of D, as described at the end of Section 3. 

7 Applications as Grids 

Discrete grids. Much work has been done in Computational Geometry over 
structures very similar to wavelet trees. We only highlight some results of inter- 
est, generally focusing on structures that use linear space. We assume here that 



we have annxn grid with n points not sharing rows nor columns. Interestingly, 
these grids with range counting and reporting operations have been intensively 
used in compressed text indexing data structures [66, 81, 38, 72, 26, 16, 30, 68] 

Range counting can be done in time 0(\gn/ Iglgn) and O(nlgn) bits [64]. 
This time cannot be improved within space 0(n\g 0 ^ n) [90], but it can be 
matched with a multiary wavelet-tree like structure using just n lg n + o(n lg n) 
bits [16]. Reaching this time, instead of the easy 0(\gn) we have explained in 
Section 4, requires a sophisticated solution to the problem of doing the range 
counting among several consecutive children of a node, that are completely con- 
tained in the x-range of the query. They [16] also obtain a range reporting time 
(for the occ points in the range) of O ( ( 1 + occ) lg n/ lg lg n) . This is not surprising 
once counting has been solved: it is a matter of upward or downward tracking on 
a multiary wavelet tree. The technique for faster upward tracking we described 
in the paragraph "Speeding up traversals" of Section 2 can be used to improve 
the reporting time to 0((1 + occ) lg e n), using 0((l/e)nlgn) bits of space [24]. 

Wavelet trees offer relevant solutions to other geometric problems, such as 
finding the dominant points in a grid, or solving visiblity queries. Those problems 
can be recast as a sequence of queries of the form "find the smallest element larger 
than i in a range" , described in the paragraph "Generic numeric sequences" of 
Section 6, and therefore solved in time O(lgn) per point retrieved [83]. That 
paper [83, 87] also studies extensions of geometric queries where the points have 
weights and statistical queries on them are posed, such as finding range sums, 
averages, minima, quantilcs, majorities, and so on. The way those queries are 
solved open interesting new avenues in the use of wavelet trees. 

Some queries, such as finding the minimum value of a two-dimensional range, 
arc solved by enriching wavelet trees with extra information aligned to the 
bitmaps. Recall that each wavelet tree node v handles a subsequence S v of the 
sequence of points 5[l,n]. To each node v with bitmap B„[l,n„] we associate a 
data structure using 2n v + o(n v ) bits that answers one-dimensional range mini- 
mum queries [41] on S v [l,n v ]. Once built, this structure does not need to access 
S v , yet it gives the position of the minimum in constant time. Since, as ex- 
plained, a two-dimensional range is covered by O(lgn) wavelet tree nodes, only 
those O(lgn) minima must be tracked upwards, where the actual weights are 
stored, to obtain the final result. Thus the query requires 0(\g 1+e n) time and 
0((l/e)nlgn) bits of space by using the fast upward tracking mechanism. 

Other queries, such as finding the i-th smallest value of a two-dimensional 
range, are handled with a wavelet tree on the weight values. Each wavelet tree 
node stores a grid with the points whose weights are in the range handled by that 
node. Then, by doing range counting queries on those grids, one can descend left 
or right, looking for the rightmost leaf (i.e., value) such that the counts of the 
children to the left of the path followed add up to less than i. The total time is 
0(lg 2 n/ lglgn), however the space becomes superlinear, 0(n lg 2 n) bits. 

Finally, an interesting extension to the typical point grids are grids of rect- 
angles, which are used in geographic information systems as minimum bounding 
rectangles of complex objects. Then one wishes to find the set of rectangles 



that intersect a query rectangle. This is well solved with an R-tree data struc- 
ture [60], but a wavelet tree may offer interesting space reductions. Brisaboa et 
al. [21] describe a technique to store n rectangles where one does not contain 
another in the x-coordinate range (so the set is first separated into maximal "x- 
independent" subsets and each subset is queried separately). Two arrays with 
the ascending lower and upper x-coordinatcs of the rectangles are stored (as the 
sets are x-independent, the same position in both arrays corresponds to the same 
rectangle). A wavelet tree on those x-coordinate-sorted rectangles is set up, so 
that each node handles a range of y-coordinate values. This wavelet tree stores 
two bitmaps per node v. one tells whether the rectangle S v [i] extends to the y- 
range of the left child, and the other whether it extends to the right child. Both 
bitmaps can store a 1 at a position i, and thus the rectangle is stored in both 
subtrees. To avoid representing a large rectangle more than O(lgn) times, both 
bits are set to 0 (which is otherwise impossible) when the rectangle completely 
contains the y-range of the current node. The total space is 0(n lg n) bits. 

Given a query [x min , x max ] x [y m i n ,y ma x], we l°°k for x min in the array of 
upper x-coordinates, to find position x;, and look for x max in the array of lower 
^-coordinates, to find position x r . This is because a query intersects a rectangle 
on the x-axis if the query does not start after the rectangle ends and the query 
does not end before the rectangle starts. Now the range [xi, x r ] is used to traverse 
the wavelet tree almost like on a typical range search, except that we map to 
the left child using rank! on one bitmap, and to the right child using rank! on 
the other bitmap. Furthermore, we report all the rectangles where both bitmaps 
contain a 0-bit, and we remove duplicates by merging results at each node, as 
the same rectangle can be encountered several times. The overall time to report 
the occ rectangles is still 0((1 + occ) lgn). 

Binary relations. A binary relation R between two sets A and B can be 
thought of as a grid of size \ A\ x \B\, containing \R\ points. Apart from strings, 
permutations and our grids, that are particular cases, binary relations are good 
abstractions for a large number of more applied structures. For example, a non- 
positional inverted index is a binary relation between a set of words and a set 
of documents, so that a word is related to the documents where it appears. As 
another example, a graph is a binary relation between the set of nodes and itself. 

The most typical operations on binary relations are determining the elements 
b e B that are related to some a <G A and vice versa, and determining whether 
a pair (a, b) e A x B is related in R. However, more complex queries are also 
of interest. For example, counting or retrieving the documents related to any 
term in a range enables on-the-fly stemming and query expansion. Retrieving 
the terms associated to a document permits vocabulary analyses. Accessing the 
documents in a range related to a term enables searches local to subcollcctions. 
Range counting and reporting allows regarding graphs at a larger granularity 
(e.g., a Web graph can be regarded as a graph of hosts, or of pages, on the fly). 

Barbay et al. [5, 6] studied a large number of complex queries for binary 
relations, including accessing the points in a range in various orders, as well 
as reporting rows or columns containing points in a range. They proposed two 



wavelet-tree-like data structures for handling the operations. One is basically a 
wavelet tree of the set of points (plus a bitmap that indicates when we move 
from one column to the next). It turns out that almost all the solutions described 
so far on wavelet trees find application to solve some of the operations. 

In the extended version [6] they use multiary wavelet trees to reduce the 
times of most of the operations. Several nontrivial structures and algorithms 
arc designed in order to divide the times of various operations by lglgn (the 
only precedent we know of is that of counting the number of points in a range 
[16]). For example, it is shown how to solve the range next value problem (recall 
paragraph "Generic numeric sequences" of Section 6) in time 0(lgn/ lglgn). 
Others, like the range quantile query, stay no better than O(lgn). 

Barbay et al. also propose a second data structure that is analogous to the 
one described for rectangles in the paragraph "Discrete grids" . Two bitmaps are 
stored per node, indicating whether a given column has points in the first and 
in the second range of rows. This extension of a wavelet tree is less powerful 
than the previous structure, but it is shown that its space is close to the entropy 
of the binary relation: (l+v^)iT+0(|A|+|.B|+|i2|) bits, where H = lg ( lA j^ 1 ). 
This is not achieved with the classical wavelet tree. A separate work [34] builds 
on this to obtain a fully-compressed grid representation, within H + o(H) bits. 

Colored range queries. A problem of interest in areas like query log and web 
mining is to count the different colors in a sequence SJ^n] over a universe of a 
colors. Inspired in the idea of Muthukrishnan [80] for document retrieval (recall 
paragraph "Document retrieval indexes" in Section 6), Gagie et al. [44] showed 
that this is a matter of counting how many values smaller than I are there in 
C[Z,r], where C[i] — max{j<i, S[j]=S[i}}. This is a range counting query for 
[l,r] x [1, 1— 1] on C seen as a grid, that can be solved in time 0(\gn) using the 
wavelet tree of C. Note that this wavelet tree, unlike that of S, uses n\gn + o(n) 
bits. Gagie et al. compressed it to n\ga + 0(n lglgn) bits, by taking advantage 
of the particular structure of C, which shows up in the bit-vectors. Gagie and 
Karkkalnen [42] then reduced the space to nHo(S) + o(ni?o(S')) + 0(n) with 
more advanced techniques, and also reduced the query time to 0(lg(r — I + 1)). 

8 Conclusions and Further Challenges 

We have described the wavelet tree, a surprisingly versatile data structure that 
offers nontrivial solutions to a wide range of problems in areas like string process- 
ing, computational geometry, and many more. An important additional asset of 
the wavelet tree is its simplicity to understand, teach, and program. This makes 
it a good data structure to be introduced at an undegraduate level, at least in its 
more basic variants. In many cases, solutions with better time complexity than 
the ones offered by wavelet trees are not so practical nor easy to implement. 

Wavelet trees seem to be unable to reach access and rank/select times of 
the form O(lglgcr), as other structures for representing sequences do [51], close 
to the lower bounds [12]. However, both have been combined to offer those time 
complexities and good zero-order compression of data and redundancy [7, 4, 12]. 



Yet, the lower bounds on some geometric problems [24], matched with current 
wavelet trees [16,6], suggest that this combination cannot be carried out much 
further than those three operations. Still, there are some complex operations 
where it is not clear that wavelet trees have matched lower bounds [45]. 

We have described the wavelet tree as a static data structure. However, if the 
bitmaps or sequences stored at the nodes support insertions and deletions in time 
indel(n), then the wavelet tree easily supports insertions and deletions in the se- 
quence S[l, n] it represents, in time 0(h-indel(n)), where h is its height. This has 
been used to support indcls in time 0((l + lgcr/lglgn)lgn/lglgn) [61,88]. The 
alphabet, however, is still fixed in those solutions. While such a limitation may 
seem natural for sequences, it looks definitely artificial when representing grids: 
one can insert and delete new ^-coordinates and points, but the y-coordinate 
universe cannot change. Creating or removing alphabet symbols requires chang- 
ing the shape of the wavelet tree, and the bitmaps or sequences stored at the 
nodes undergo extensive modifications upon small tree shape changes (e.g., AVL 
rotations). Extending dynamism to support this type of updates, with good time 
complexities at least in the amortized sense, is an important challenge for this 
data structure. It is also unclear what is the dynamic lower bound on a general 
alphabet; on a constant-size alphabet it is 0(lgn/lglgn) [23]. Very recently [56] 
a dynamic scheme for a particular case (sequences of strings) has been proposed. 

A path that, in our opinion, has only started to be exploited, is to enhance 
the wavelet tree with "one-dimensional" data structures at its nodes v, so that, 
by efficiently solving some kind of query over the corresponding subsequences S v , 
we solve a more complex query on the original sequence S. In most cases along 
this survey, these one-dimensional queries have been rank and select on the 
bitmaps, but we have already shown some examples involving more complicated 
queries [44,87,83]. This approach may prove to be very fruitful. 

In terms of practice, although there are many successful and publicly avail- 
able implementations of wavelet tree variants (see, e.g., libcds.recoded.cl and 
http://www.uni-ulm.de/in/theo/research/sdsl.html), there are some challenges 
ahead, such as carrying to practice the theoretical results that promise fast and 
small multiary wavelet trees [40, 50, 17] and lower redundancies [49, 91, 50]. 
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